1 ggplot2 package

1.1 Introduction

ggplot2 implements the Layered Grammar of Graphics, a system for building visualizations that is built around cases and variables.

library(ggplot2)

mpg
# A tibble: 234 × 11
   manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
   <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
 1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
 2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…
 3 audi         a4           2    2008     4 manu… f        20    31 p     comp…
 4 audi         a4           2    2008     4 auto… f        21    30 p     comp…
 5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…
 6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…
 7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…
 8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
 9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…
# … with 224 more rows
ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy))


1.2 Visualizing variables (aesthetic mappings): aes

There are different types of aesthetics, which change the properties of the objects that are drawn:

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy, size = cyl, color = class))

Example:

  • In the previous graphic, make all the dots blue (color="blue")

1.2.1 Set vs. map

  • Inside of aes(): it is considered to be in the data space. Therefore, a transformation is applied to represent it.

  • Outside of aes(): ggplot2 treats input as value in the visual space and sets the property to it.

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy, color = "blue"))

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy), color = "blue")

Example:

  • In the above chart, make all the points with displ <5 draw in one color and those with displ> = 5 in another.
ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy, color = displ < 5))


1.3 Facets: facet

Subplots that display subsets of the data.

  • facet_wrap(): facets depending on a single discrete variable
  • facet_grid(): facets according to 2 variables (\(\texttt{rows} \sim \texttt{columns}\), \(.\) for no split)
ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy)) + 
  facet_wrap(~ class)

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy)) + 
  facet_grid(drv ~ cyl)

Example:

  • Check what happens when . is used instead of one of the variables in the formula inside facet_grid().
 ggplot(data = mpg) +
   geom_point(aes(x = displ, y = hwy)) +
   facet_grid(drv ~ .)

 ggplot(data = mpg) +
   geom_point(aes(x = displ, y = hwy)) +
   facet_grid(. ~ cyl)


1.4 Visualizing cases: geom

What is the difference between these two graphics?

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy))

ggplot(data = mpg) + 
  geom_smooth(aes(x = displ, y = hwy))

geom: is the geometric object that a graph uses to represent data

Each function geom has an argument aes, although not any aes works with each geom (reference). For example, you can define shape of a geom_point but not a geom_line.

Example:

  • Make 3 different figures from the previous figure using the variable drv with the aesthetics color, linetype and group.
ggplot(data = mpg) +
  geom_smooth(aes(x = displ, y = hwy, color = drv))

ggplot(data = mpg) +
  geom_smooth(
    aes(x = displ, y = hwy, linetype = drv)
  )

ggplot(data = mpg) +
  geom_smooth(aes(x = displ, y = hwy, group = drv))

1.4.1 Multiple layers

Each new geom adds a new layer to the graph.

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy)) +
  geom_smooth(aes(x = displ, y = hwy))

1.4.2 Global vs local

Mappings and data included in ggplot() will be applied globally to all layers.

ggplot(data = mpg, aes(x = displ, y = hwy)) +
   geom_point() +
   geom_smooth()

Mappings and data included in a geom_() function will overwrite global conditions only for this layer.

ggplot(data = mpg, aes(x = displ, y = hwy)) + 
  geom_point(aes(color = class)) + 
  geom_smooth()

Each layer can be associated with different data.frames.

It is mandatory to specify the data parameter in those geometries that use a different dataset than the one that appears inggplot().

mpg_subcompact <- mpg[mpg$class == "subcompact", ]

ggplot(data = mpg, aes(x = displ, y = hwy)) + 
  geom_point(aes(color = class)) + 
  geom_smooth(data = mpg_subcompact, se = FALSE)

Example:

  • Recreate the code in R needed to generate the following figures, from: p <- ggplot(data = mpg, aes(x = displ, y = hwy))

p <- ggplot(data = mpg, aes(x = displ, y = hwy))
  p + geom_point() + 
   geom_smooth(se = FALSE)

  p + geom_point() + 
   geom_smooth(aes(group = drv), se = FALSE)

  p + geom_point(aes(color = drv)) + 
   geom_smooth(aes(color = drv), se = FALSE)

  p + geom_point(aes(color = drv)) + 
   geom_smooth(se = FALSE)

  p + geom_point(aes(color = drv)) +
  geom_smooth(aes(linetype = drv), se = FALSE)

  p + geom_point(aes(color = drv)) + 
   geom_smooth(aes(group = drv), se = FALSE)

1.5 Position

It’s about how the chart orders the overlapping geoms.

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, colour = cut))

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = cut))

If a variable is used in aes(fill) it automatically shows stacked bars. This is done by the position setting.

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity))

Example:

  • Modify the above graphic using different values in the position parameter (“stack”, “dodge”, “identity”, “fill”).
p <- ggplot(diamonds, aes(x = cut, fill = clarity))

p + geom_bar()
p <- ggplot(diamonds, aes(x = cut, fill = clarity))

p + geom_bar(position = "stack")

p + geom_bar(position = "dodge")

p + geom_bar(position = "identity")

p + geom_bar(position = "fill")


1.6 Formal aspects of ggplot2

1.6.1 Labels: titles, axis, legend

p <- ggplot(data = mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth()
p + 
  labs(title = "Fuel efficiency vs. Engine size",
       x = "Engine displacement (L)", 
       y = "Highway fuel efficiency (mpg)",
       color = "Type of Car",
       caption = "Data from fueleconomy.gov")

1.6.2 Scales

Normally ggplot2 adds scales automatically (scale_ + name of the aesthetic + _ + name of the scale)

(p <- ggplot(mpg, aes(displ, hwy)) + 
  geom_point(aes(color = class)))

p +
  scale_x_continuous() +
  scale_y_continuous() +
  scale_color_discrete()

p +
  scale_color_discrete(labels = c("A" , "B", "C", "D", "E", "F", "G"))

The scales of the axes can be modified:

p +
  scale_x_continuous(labels = NULL) +
  scale_y_continuous(breaks = seq(15, 40, by = 5))

p +
  scale_y_log10(breaks = seq(15, 40, by = 5))

1.6.3 Zoom

ggplot(data = mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth() + 
  coord_cartesian(xlim = c(5, 7), ylim = c(10, 30)) 

1.6.4 Themes

You can change the appearance of elements that do not come from the data with theme_().

p +  theme_bw()

p +  theme_grey()

p +  theme_light()

p +  theme_dark()

Make all figures generated with ggplot2 use the same theme:

theme_set(theme_bw())

1.6.5 Additional themes

There is a package with additional themes: ggthemes

library(ggthemes)

p <- ggplot(mpg, aes(x = displ, y = hwy, colour = factor(cyl))) +
  geom_point() +
  labs(title = "mpg")

# Economist theme
p + theme_economist()

# Economist theme + color palette
p + theme_economist() + scale_colour_economist() 

1.6.6 Define your own themes

It is also possible to define own themes.

theme_jesus <- function () { 
    theme_bw(base_size=12, base_family="Courier") %+replace% 
        theme(
            panel.background  = element_blank(),
            plot.background = element_rect(fill="gray96", colour=NA), 
            legend.background = element_rect(fill="transparent", colour=NA),
            legend.key = element_rect(fill="transparent", colour=NA)
        )
}

p + theme_bw()

p + theme_jesus()

Exercise:

  • Experiment with labels, themes and scales in order to create a figure like this, based on the \(\texttt{diamonds}\) data (x: \(\texttt{carat}\), y: \(\texttt{price}\), color: \(\texttt{cut}\)):

ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point() +
  geom_smooth(aes(color = cut), se = FALSE) +
  labs(title = "Ideal cut diamonds command the best price for ever carat size",
       subtitle = "Lines show GAM estimate of mean values for each level of cut",
       caption = "Data provided by Hadley Wickham",
       x = "Log Carat Size",
       y = "Log Price Size",
       color = "Cut Rating") +
  scale_x_log10() +
  scale_y_log10() +
  scale_color_brewer(palette = "Greens") +
  theme_light()

1.7 Save plots

ggsave saves the last generated plot in the working directory:

ggsave("my-plot.pdf", width = 6, height = 6)
ggsave("my-plot.png", width = 6, height = 6)

Another way:

png("my-plot_4.png", width = 800, height = 600)
  print(p)
dev.off()
quartz_off_screen 
                2 

1.8 Plotly

Plotly: is a Javascript-based graphics library that generates interactive graphics.

It can only be used to generate HTML-based documents.

library(plotly)

(p <- ggplotly(p))

1.9 Display several plots at once

library(gridExtra)

p1 <- ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point()

p2 <- ggplot(diamonds, aes(x = carat, y = price)) +
  geom_smooth(aes(color = cut), se = FALSE)

grid.arrange(p1, p2, nrow = 1)

1.10 Template

2 References